Search Results: "glandium"

23 February 2012

Mike Hommey: Fun with weak symbols

Consider the following foo.c source file:
extern int bar() __attribute__((weak));
int foo()
return bar();
And the following bar.c source file:
int bar()
return 42;
Compile both sources:
$ gcc -o foo.o -c foo.c -fPIC
$ gcc -o bar.o -c bar.c -fPIC
In the resulting object for foo.c, we have an undefined symbol reference to bar. That symbol is marked weak. In the resulting object for bar.c, the bar symbol is defined and not weak. What we expect from linking both objects is that the weak reference is fulfilled by the existence of the bar symbol in the second object, and that in the resulting binary, the foo function calls bar.
$ ld -shared -o test1.so foo.o bar.o
And indeed, this is what happens.
$ objdump -T test1.so grep "\(foo\ bar\)"
0000000000000260 g DF .text 0000000000000007 foo
0000000000000270 g DF .text 0000000000000006 bar
What do you think happens if the bar.o object file is embedded in a static library?
$ ar cr libbar.a bar.o
$ ld -shared -o test2.so foo.o libbar.a
$ objdump -T test2.so grep "\(foo\ bar\)"
0000000000000260 g DF .text 0000000000000007 foo
0000000000000000 w D *UND* 0000000000000000 bar
The bar function is now undefined and there is a weak reference for the symbol. Calling foo will crash at runtime. This is apparently a feature of the linker. If anyone knows why, I would be interested to hear about it. Good to know, though.

18 February 2012

Mike Hommey: Debian Mozilla news

Here are the few noteworthy news about Mozilla packages in Debian:

Mike Hommey: How to waste a lot of space without knowing

const char *foo = "foo";
This was recently mentioned on bugzilla, and the problem is usually underestimated, so I thought I would give some details about what is wrong with the code above. The first common mistake here is to believe foo is a constant. It is a pointer to a constant. In practical ELF terms, this means the pointer lives in the .data section, and the string constant in .rodata. The following code defines a constant pointer to a constant:
const char * const foo = "foo";
The above code will put both the pointer and the string constant in .rodata. But keeping a constant pointer to a constant string is pointless. In the above examples, the string itself is 4 bytes (3 characters and a zero termination). On 32-bits architectures, a pointer is 4 bytes, so storing the pointer and the string takes 8 bytes. A 100% overhead. On 64-bits architectures, a pointer is 8 bytes, putting the total weight at 12 bytes, a 200% overhead. The overhead is always the same size, though, so the longer the string, the smaller the overhead, relatively to the string size. But there is another, not well known, hidden overhead: relocations. When loading a library in memory, its base address varies depending on how many other libraries were loaded beforehand, or depending on the use of address space layout randomization (ASLR). This also applies to programs built as position independent executables (PIE). For pointers embedded in the library or program image to point to the appropriate place, they need to be adjusted to the base address where the program or library is loaded. This process is called relocation. The relocation process requires information which is stored in .rel.* or .rela.* ELF sections. Each pointer needs one relocation. The relocation overhead varies depending on the relocation type and the architecture. REL relocations use 2 words, and RELA relocations use 3 words, where a word is 4 bytes on 32-bits architectures and 8 bytes on 64-bits architectures. On x86 and ARM, to mention the most popular 32-bits architectures nowadays, REL relocations are used, which makes a relocation weigh 8 bytes. This puts the pointer overhead for our example string to 12 bytes, or 300% of the string size. On x86-64, RELA relocations are used, making a relocation weigh 24 bytes! This puts the pointer overhead for our example string to 32 bytes, or 800% of the string size! Another hidden cost of using a pointer to a constant is that every time it is used in the code, there will be pointer dereference. A function as simple as
int bar() return foo;
weighs one instruction more when foo is defined const char *. On x86, that instruction weighs 2 bytes. On x86-64, 3 bytes. On ARM, 4 bytes (or 2 in Thumb). That weight can vary depending on the additional instructions required, but you get the idea: using a pointer to a constant also adds overhead to the code, both in time and space. Also, if the string is defined as a constant instead of being used as a literal in the code, chances are it s used several times, multiplying the number of such instructions. Update: Note that in the case of const char * const, the compiler will optimize these instruction and avoid reading the pointer, since it s never going to change. The symbol for foo is also exported, making it available from other libraries or programs, which might not be required, but also adds its own overhead: an entry in the symbols table (5 words), an entry in the string table for the symbol name (strlen("foo") + 1) and an entry in the symbols hash chain table (4 bytes if only one type of hash table (sysv or GNU) is present, 8 if both are present), and possibly an entry in the symbols hash bucket table, depending on the other exported symbols (4 or 8 bytes, as chain table). It can also affect the size of the bloom filter table in the GNU symbol hash table. So here we are, with a seemingly tiny 3 character string possibly taking 64 bytes or more! Now imagine what happens when you have an array of such tiny strings. This also doesn t only apply to strings, it applies to any kind of global pointer to constants. In conclusion, using a definition like
const char *foo = "foo";
is almost never what you want. Instead, you want to use one of the following forms:

1 November 2011

Philipp Kern: Useful Firefox extensions

Many people around me switched to Chrome or Chromium. I also used it for a bit, but I was a bit disappointed about the extensions available. To show why, here's a list of the extensions I've currently installed:
If Firefox on Android were quicker to start and faster overall, I might even use it there. But as-is it's not very useful. Sadly this also means that I can't use Firefox Sync on my phone and as I don't use Chrome on my desktop I also can't use Chrome to Phone. So I usually go and build a QR code on my laptop and read that with Android's Barcode Scanner.

Of course I'm actually using Iceweasel and I'm very grateful for Mike Hommey's efforts to track the release channel on mozilla.debian.net.

14 September 2011

Mike Hommey: Building a custom kernel for the Nexus S

There are several reasons why someone would want to build a custom kernel for their Android phone. In my case, this is because I wanted performance counters (those used by the perf tool that comes with the kernel source). In Julian Seward s case, he wanted swap support to overcome the limited memory amount on these devices in order to run valgrind. In both cases, the usual suspects (AOSP, CyanogenMod) don t provide the wanted features in prebuilt ROMs. There are also several reasons why someone would NOT want to build a complete ROM for their Android phone. In my case, the Nexus S is what I use to work on Firefox Mobile, but it is also my actual mobile phone. It s a quite painful and long process to create a custom ROM, and another long (but arguably less painful thanks to ROM manager) process to backup the phone data, install the ROM, restore the phone data. And if you happen to like or use the proprietary Google Apps that don t come with the AOSP sources, you need to add more steps. There are however tricks that allow to build a custom kernel for the Nexus S and use it with the system already on the phone. Please note that the following procedure has only been tested on two Nexus S with a 2.6.35.7-something kernel (one with a stock ROM, but unlocked, and another one with an AOSP build). Also please note that there are various ways to achieve many of the steps in this procedure, but I ll only mention one (or two in a few cases). Finally, please note some steps rely on your device being rooted. There may be ways to do without, but I m pretty sure it requires an unlocked device at the very least. This post doesn t cover neither rooting nor unlocking. Preparing a build environment To build an Android kernel, you need a cross-compiling toolchain. Theoretically, any will do, provided it targets ARM. I just used the one coming in the Android NDK:
$ wget http://dl.google.com/android/ndk/android-ndk-r6b-linux-x86.tar.bz2
$ tar -jxf android-ndk-r6b-linux-x86.tar.bz2
$ export ARCH=arm
$ export CROSS_COMPILE=$(pwd)/android-ndk-r6b/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-
For the latter, you need to use a directory path containing prefixed versions (such as arm-eabi-gcc or arm-linux-androideabi-gcc), and include the prefix, but not gcc . You will also need the adb tool coming from the Android SDK. You can install it this way:
$ wget http://dl.google.com/android/android-sdk_r12-linux_x86.tgz
$ tar -zxf android-sdk_r12-linux_x86.tgz
$ android-sdk-linux_x86/tools/android update sdk -u -t platform-tool
$ export PATH=$PATH:$(pwd)/android-sdk-linux_x86/platform-tools
Building the kernel For the Nexus S, one needs to use the Samsung Android kernel tree, which happens to be unavailable at the moment of writing due to the kernel.org outage. Fortunately, there is a clone used for the B2G project, which also happens to contain the necessary cherry-picked patch to add support for the PMU registers on the Nexus S CPU that are needed for the performance counters.
$ git clone -b devrom-2.6.35 https://github.com/cgjones/samsung-android-kernel
$ cd samsung-android-kernel
You can then either start from the default kernel configuration:
$ make herring_defconfig
or use the one from the B2G project, which enables interesting features such as oprofile:
$ wget -O .config https://raw.github.com/cgjones/B2G/master/config/kernel-nexuss4g
From then, you can use the make menuconfig or similar commands to further configure your kernel. One of the problems you d first encounter when booting such a custom kernel image is that the bcm4329 driver module that is shipped in the system partition (and not in the boot image) won t match the kernel, and won t be loaded. The unfortunate consequence is the lack of WiFi support. One way to overcome this problem is to overwrite the kernel module in the system partition, but I didn t want to have to deal with switching modules when switching kernels. There is however a trick allowing the existing module to be loaded by the kernel: compile a kernel with the same version string as the one already on the phone. Please note this only really works if the kernel is really about the same. If there are differences in the binary interface between the kernel and the modules, it will fail in possibly dangerous ways. To use that trick, you first need to know what kernel version is running on your device. Settings > About phone > Kernel version will give you that information on the device itself. You can also retrieve that information with the following command:
$ adb shell cat /proc/version
With my stock ROM, this looks like the following:
Linux version 2.6.35.7-ge382d80 (android-build@apa28.mtv.corp.google.com) (gcc version 4.4.3 (GCC) ) #1 PREEMPT Thu Mar 31 21:11:55 PDT 2011
In the About phone information, it looks like:
2.6.35.7-ge382d80
android-build@apa28
The important part above is -ge382d80, and that is what we will be using in our kernel build. Make sure the part preceding -ge382d80 does match the output of the following command:
$ make kernelversion
The trick is to write that -ge382d80 in a .scmversion file in the kernel source tree (obviously, you need to replace -ge382d80 with whatever your device has):
$ echo -ge382d80 > .scmversion
The kernel can now be built:
$ make -j$(($(grep -c processor /proc/cpuinfo) * 3 / 2))
The -j part is the general rule I use when choosing the number of parallel processes make can use at the same time. You can pick whatever suits you better. Before going further, we need to get back to the main directory:
$ cd ..
Getting the current boot image The Android boot image living in the device doesn t contain only a kernel. It also contains a ramdisk containing a few scripts and binaries, that starts the system initialization. As we will be using the ramdisk coming with the existing kernel, we need to get that ramdisk from the device flash memory:
$ adb shell cat /proc/mtd awk -F'[:"]' '$3 == "boot" print $1 '
The above command will print the mtd device name corresponding to the boot partition. On the Nexus S, this should be mtd2.
$ adb shell
$ su
# dd if=/dev/mtd/mtd2 of=/sdcard/boot.img bs=4096
2048+0 records in
2048+0 records out
8388608 bytes transferred in x.xxx secs (xxxxxxxx bytes/sec)
# exit
$ exit
In the above command sequence, replace mtd2 with whatever the previous command did output for you. Now, you can retrieve the boot image:
$ adb pull /sdcard/boot.img
Creating the new boot image We first want to extract the ramdisk from that boot image. There are various tools to do so, but for convenience, I took unbootimg, on github, and modified it slightly to seemlessly support the page size on the Nexus S. For convenience as well, we ll use mkbootimg even if fastboot is able to create boot images. Building unbootimg, as well as the other tools rely on the Android build system, but since I didn t want to go through setting it up, I figured a minimalistic way to build the tools:
$ git clone https://github.com/glandium/unbootimg.git
$ git clone git://git.linaro.org/android/platform/system/core.git
The latter is a clone of git://android.git.kernel.org/platform/system/core.git, which is down at the moment.
$ gcc -o unbootimg/unbootimg unbootimg/unbootimg.c core/libmincrypt/sha.c -Icore/include -Icore/mkbootimg
$ gcc -o mkbootimg core/mkbootimg/mkbootimg.c core/libmincrypt/sha.c -Icore/include
$ gcc -o fastboot core/fastboot/ protocol,engine,bootimg,fastboot,usb_linux,util_linux .c core/libzipfile/ centraldir,zipfile .c -Icore/mkbootimg -Icore/include -lz
Once the tools are built, we can extract the various data from the boot image:
$ unbootimg/unbootimg boot.img
section sizes incorrect
kernel 1000 2b1b84
ramdisk 2b3000 22d55
second 2d6000 0
total 2d6000 800000
...but we can still continue
Don t worry about the error messages about incorrect section sizes if it tells you we can still continue . The unbootimg program creates three files: All that is left to do is to generate the new boot image:
$ eval ./mkbootimg $(sed s,boot.img-kernel,samsung-android-kernel/arch/arm/boot/zImage, boot.img-mk)
Booting the image There are two ways you can use the resulting boot image: one-time boot or flash. If you want to go for the latter, it is best to actually do both, starting with the one-time boot, to be sure you won t be leaving your phone useless (though recovery is there to the rescue, but is not covered here). First, you need to get your device in the fastboot mode, a.k.a. boot-loader:
$ adb reboot bootloader
Alternatively, you can power it off, and power it back on while pressing the volume up button. Once you see the boot-loader screen, you can test the boot image with a one-time boot:
$ ./fastboot boot boot.img
downloading 'boot.img'...
OKAY [ 0.xxxs]
booting...
OKAY [ 0.xxxs]
finished. total time: 0.xxxs
As a side note, if fastboot sits waiting for device , it either means your device is not in fastboot mode (or is not connected), or that you have permissions issues on the corresponding USB device in /dev. Your device should now be starting up, and eventually be usable under your brand new kernel (and WiFi should be working, too). Congratulations. If you want to use that kernel permanently, you can now flash it after going back in the bootloader:
$ adb reboot bootloader
$ ./fastboot flash boot boot.img
sending 'boot' (2904 KB)...
OKAY [ 0.xxxs]
writing 'boot'...
OKAY [ 0.xxxs]
finished. total time: 0.xxxs
$ ./fastboot reboot
Voil .

9 September 2011

Mike Hommey: Initial VMFS 5 support

Today I added an initial VMFS 5 support to vmfs-tools. For the most part, VMFS 5 is VMFS 3, so these should just work as before, and adds new features; but this initial support is very limited: On related news, while the git repository here is kept alive, I also pushed it to github. The main reason I did so is the Issue tracker. Update: It turns out the small file support makes the vmfs-tools crash when accessing files bigger than 256GB, because the assumption made when reverse engineering was wrong and clashes with how files bigger than 256GB are implemented. It also turns out large single extent volumes may be working already because it looks like it was only about tuning an existing value, like smaller sub-block and increased file count. Update 2: Latest master now supports small files without crashing with files bigger than 256GB.

29 August 2011

Mike Hommey: Extreme tab browsing

I have a pathological use of browser tabs: I use a lot of them. A lot is probably an understatement. We could say I use them as bookmarks of things I need to track. A couple weeks ago, I was saying I had around two hundred tabs opened. I now actually have much more. It affected startup until I discovered that setting the browser.sessionstore.max_concurrent_tabs pref to 0 was making things much better by only loading tabs when they are selected. This preference has/will become browser.sessionstore.restore_on_demand. However, since I only start my main browser once a day, while other applications start and while I begin to read email, I hadn t noticed that this was still heavily affecting startup time: about:startup tells me reaching the sessionRestored state takes seven seconds, even on a warm startup. It also affects memory usage, because even when tabs are only loaded on demand, there is a quite big overhead for each tab. And more importantly, it gets worse with time. And I think the user interface is actively making it worse. So, to get an idea how bad things were in my session, I wrote a little restartless extension. After installing it, you can go to the about:tabs url to see the damage on your session. Please note that the number of groups is currently wrong until you open the tab grouping interface. This is what the extension has to say about my session 2 days ago, right after a startup: The first thing to note is that when I filed the memory bug 4 days earlier, I had a bit more than 470 tabs in that session. You can see 4 days later, I now have 555 tabs (if excluding the about:tabs tab). The second thing to note is something I suspected because it s so easy to get there: a lot of the tabs are opened on the same address. Since Firefox 4.0, if I m not mistaken, there is a great feature in the awesomebar, that allows to jump to an existing tab matching what you type in the urlbar. That is very useful, and I use it a lot. However, there are a lot of cases where it s not as useful as it could be. One of the addresses I visit a lot is http://buildd.debian.org/iceweasel. It gives me the build status of the latest iceweasel package I uploaded to Debian unstable. That url is particularly known in my browsing history, and is the first hit when I type buildd in the urlbar (actually, even typing b brings it first). Unfortunately, that url redirects to https://buildd.debian.org/status/package.php?p=iceweasel through an HTTP redirection. I say unfortunately because when I type buildd in the urlbar, I get 6 suggestions for urls in the form http://buildd.debian.org/package (I also watch other packages build status), and the suggestion to switch to the existing tab for what the first hit would get me to is 7th. Guess what? The suggestion list only shows 6 items ; you have to scroll to see the 7th. The result is that I effectively have fifteen tabs open on that url. I also keep a lot of bugzilla.mozilla.org bugs open in different tabs. The extension tells me there are 255 of them for 166 unique bugs. Largely, the duplicate bug tabs are due to having these bugs open in some tab, but accessing the same bugs from somewhere else, usually a dependent bug or TBPL. I also have 5 tabs opened on my request queue. I usually get there by going to the bugzilla home page and clicking on the My Requests link. And I have several tabs opened on the same bug lists. For the same reason. When I started using tab groups, I splitted in very distinct groups. Basically, one for Mozilla, one for Debian, one for stuff I want to follow (usually blog posts I want to follow comments from), and one for the rest. While I was keeping up with grouping at the beginning, I don t anymore, and the result is that each group is now a real mess. Firefox has hundreds of millions users. It s impossible to create a user experience that works for everyone. One thing is sure, it doesn t work for me. My usage is probably very wrong at different levels, but I don t feel my browser is encouraging me to use it better, except by making my number of opened tabs explode to an unmanageable level (I already have 30 tabs more than when I started writing this post 2 days ago). There are a few other things I would like to know about my usage that my extension hasn t told me yet, either because it doesn t tell, or because I haven t looked: Reflecting on my usage patterns, I think a few improvements, either in the stock browser, or through extensions, could make my browsing easier: Maybe these exist as extensions, I don t know. It s hard to find very specific things like that through an add-on search (though I haven t searched very hard). [Looks like there is an experiment for the auto tab grouping part] I think it would also be interesting to have something like Test Pilot, but for users that want to know the answer to How do I use my browser? . As I understand it, Test Pilot can show individual user data, but it only can do so if there is such data, and you can t get data for past studies you didn t take. In my case, I m not entirely sure that, apart from the pinned tabs, I use the tab bar a lot. And even for pinned tabs, most of the time I use keyboard shortcuts. I m not using the menu button that much either. I already removed the url and search bar (most of the time) with LessChrome HD. Maybe I could go further and use the full window for web browsing.

30 July 2011

Mike Hommey: -feliminate-dwarf2-dups FAIL

DWARF-2 is a format to store debugging information. It is used on many ELF systems such as GNU/Linux. With the way things are compiled, there is a lot of redundant information in the DWARF-2 sections of an ELF binary. Fortunately, there is an option to gcc that helps dealing with the redundant information and downsizes the DWARF-2 sections of ELF binaries. This option is -feliminate-dwarf2-dups. Unfortunately, it doesn t work with C++. With -g alone, libxul.so is 468 MB. With -g -feliminate-dwarf2-dups, it is 1.5 GB. FAIL. The good news is that as stated in the message linked above, -gdwarf-4 does indeed help reducing debugging information size. libxul.so, built with -gdwarf-4 is 339 MB. This however requires gcc 4.6 and a pretty recent gdb.

23 July 2011

Mike Hommey: debian-rules-missing-recommended-target, dh, and dumb make

Lintian now has a warning for debian/rules missing build-arch and build-indep targets. As a dh user, I was surprised that some of my dh-using packages had this problem. And when looking at the source, I remembered how I came to this: GNU make is stupid. Considering the following excerpt of the GNU make manual:
.PHONY
The prerequisites of the special target .PHONY are considered to be phony targets. When it is time to consider such a target, make will run its recipe unconditionally, regardless of whether a file with that name exists or what its last-modification time is.
And considering the following debian/rules:
.PHONY: build
%:
        dh $@
What do you think happens when you run debian/rules build in a directory containing a build file or directory?
make: Nothing to be done for build .
However, an explicit rule, like the following, works:
.PHONY: build
build:
        dh $@
It happens that many of the packages I maintain contain a build subdirectory in their source. As such, to work around the aforementioned issue, I just declared the dh rules explicitely, as in:
.PHONY: build binary binary-arch binary-indep (...)
build binary binary-arch binary-indep (...):
        dh $@
And this obviously doesn t scale for new rules such as build-arch and build-indep. To be future-proof, I ll use the following instead:
.PHONY: build
build %:
        dh $@
I don t know why I didn t do that the first time

7 July 2011

Mike Hommey: Prepare yourself for the upcoming changes on the mozilla.debian.net repository

With the upcoming changes in the beta and aurora channels (6.0 is going to reach beta, and 7.0 to reach aurora), the mozilla.debian.net repository is going to adapt, and drop versioned archives in favor of channel archives. The channel archives for beta and aurora already existed, however what is new is the release channel. Currently, that channel contains Iceweasel 5.0, but as soon as 6.0 is released, that s what the release channel will contain. To summarize, if you added lines containing iceweasel-x.0 where x is 4, 5, or 6 in your /etc/apt/sources.list, you need to update it to the corresponding channel (don t forget 4.0 is dead, you should use release instead). The iceweasel-5.0 and iceweasel-6.0 archives still exist at the moment but will be dropped as soon as the new aurora and beta releases are ready, which should be real soon now (only waiting for actual upstream releases). As a somehow related note, it should be noted that Iceweasel 5.0 should (finally) enter Debian unstable on the 15th of July, at which point the latest 6.0 beta will also be uploaded to Debian experimental. It is still unclear how long it will take for Iceweasel 5.0 to reach Debian testing/wheezy, because of all the reverse dependencies, but when that happens, we ll also be able to push it to backports.debian.org.

23 June 2011

Steve Kemp: So you want to install the most recent firefox?

If you've been following new releases you'll see there is a new Firefox browser out, version 5.0. This will almost certainly make its way into Debian's experimental tree soon, but that doesn't help users of the Debian Stable release. The only sane option for those users (such as myself), without a backport, is to install locally. So I did the obvious thing, I made /opt/firefox then installed the binary release into it. Then I found that it was good, lovely and fast. Unfortunately the system firefox and the local firefox are not really compatible. Run the local one, then click on a link in the gnome terminal and it wants to open the system one. Ho hum. The solution: Of course this being Debian we don't want to do that. So instead here is a package that will let you do that: Download. Build. Install. If you install your local package to a location different than /opt/firefox update the configuration file /etc/firefox/firefox.conf to point to it. Possibly useful? ObQuote: "I could help you cross your yard." - Up

22 June 2011

Mike Hommey: Iceweasel 5.0 in experimental

I just pushed Iceweasel 5.0 to Debian experimental. Why not unstable, some will ask? Well, because we still need to give some time after a first notice before breaking plenty of packages (Thanks to Julien Cristau for the MBF, by the way). I also discontinued the Iceweasel 4.0 backport for Squeeze, as Iceweasel 4.0 won t be receiving security updates. Speaking of security updates, 3.6.18 was also made available on mozilla.debian.net for Wheezy, Squeeze and Lenny. However, I still have to backport the necessary patches to 3.5 in Squeeze and 3.0 in Lenny. My real life schedule wasn t compatible with the security release schedule, so I got late on the security backport train. In the coming weeks, there will also be some additional changes to the mozilla.debian.net repository, but I ll give more details when that happens.

21 May 2011

Mike Hommey: Iceweasel 5.0b2

would have been released today if mozilla.debian.net was responding. But it s moving to a new server.

13 May 2011

Mike Hommey: Debian Squeeze + btrfs = FAIL

Executive summary: Don t use btrfs on Debian Squeeze.
Longer summary: Don t use btrfs RAID with the kernel Debian Squeeze comes with. About six months ago, I set up a new server to handle this web site, mail, and various other things. The system and most services (including web and mail) was set to use an MD RAID 1 array across two small partitions on two separate disks, and the remaining space was setup in three different btrfs file systems: Three days ago, this happened:
May 10 10:18:04 goemon kernel: [3545898.548311] ata4: hard resetting link
May 10 10:18:04 goemon kernel: [3545898.867556] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
May 10 10:18:04 goemon kernel: [3545898.874973] ata4.00: configured for UDMA/33
followed by other ATA related messages, then, garbage such as:
May 10 10:18:07 goemon kernel: [3545901.28123] sd3000 d]SneKy:AotdCmad[urn][ecitr
May 10 10:18:07 goemon kernel: 4[550.821 ecio es aawt es ecitr i e)
May 10 10:18:07 goemon kernel: 6[550.824     20 00 00 00 00 00 00 00 <>3491225     16 44 <>3491216]s ::::[d]Ad es:N diinlsneifrain<>3491216]s ::::[d]C:Ra(0:2 00 03 80 06 0
3491217]edrqet / ro,dvsb etr2272
May 10 10:18:07 goemon kernel: 3[550.837 ad:sb:rshdln etr2252
May 10 10:18:07 goemon kernel: 6[551214]s ::::[d]Rsl:hsbt=I_Kdiebt=RVRSNE<>3491215]s ::::[d]SneKy:AotdCmad[urn][ecitr
May 10 10:18:07 goemon kernel: 4[550.833 ecitrsnedt ihsnedsrpos(nhx:<>3491216]    7 b0 00 00 c0 a8 00 00 0
Then later on:
May 10 12:01:18 goemon kernel: [3552089.226147] lost page write due to I/O error on sdb4
May 10 12:01:18 goemon kernel: [3552089.226312] lost page write due to I/O error on sdb4
May 10 12:10:14 goemon kernel: [3552624.625669] btrfs no csum found for inode 23642 start 0
May 10 12:10:14 goemon kernel: [3552624.625783] btrfs no csum found for inode 23642 start 4096
May 10 12:10:14 goemon kernel: [3552624.625884] btrfs no csum found for inode 23642 start 8192
etc. and more garbage. At that point, I wanted to shutdown the server, check the hardware, and reboot. Shutdown didn t want to proceed completely. Btrfs just froze on the sync happening during the shutdown phase, so I had to power off violently. Nothing seemed really problematic on the hardware end, and after a reboot, both disks were properly working. The MD RAID would resynchronize, and the btrfs filesystems would be automatically mounted. It would work for a while, until such things could be seen in the logs, with more garbage as above in between:
May 10 14:41:18 goemon kernel: [ 1253.455545] __ratelimit: 35363 callbacks suppressed
May 10 14:45:04 goemon kernel: [ 1478.717749] parent transid verify failed on 358190825472 wanted 42547 found 42525
May 10 14:45:04 goemon kernel: [ 1478.717936] parent transid verify failed on 358316642304 wanted 42547 found 42515
May 10 14:45:04 goemon kernel: [ 1478.717939] parent transid verify failed on 358190825472 wanted 42547 found 42525
May 10 14:45:04 goemon kernel: [ 1478.718128] parent transid verify failed on 358316642304 wanted 42547 found 42515
May 10 14:45:04 goemon kernel: [ 1478.718131] parent transid verify failed on 358190825472 wanted 42547 found 42525
Then there would be kernel btrfs processes going on and on sucking CPU and I/O, doing whatever it was doing. At such moment, most file reading off one of the btrfs volumes would either take very long or freeze, and un-mounting would only freeze. At that point, considering the advantages of btrfs (in my case, mostly, snapshots) were outweighed by such issues (this wasn t my first btrfs fuck up, but by large, the most dreadful) and the fact that btrfs is just so slow compared to other filesystems, I decided I didn t want to care trying to save these filesystems from their agonizing death, and that I d just go with ext4 on MD RAID instead. Also, I didn t want to just try (with the possibility of going through similar pain) again with a more recent kernel. Fortunately, I had backups of most of the data (only problem being the time required to restore that amount of data), but for the few remaining things which, by force of bad timing, I didn t have a backup of, I needed to somehow get them back from these btrfs volumes. So I created new file systems to replace the btrfs volumes I could directly throw away and started recovering data from backups. I also, at the same time, tried to copy a big disk image from the remaining btrfs volume. Somehow, this worked, with the system load varying between 20 and 60 (with a lot of garbage in the logs and other services deeply impacted as well) But when trying to copy the remaining files I wanted to recover, things got worse, so I had to initiate a shutdown, and power cycle again. Since apparently the kernel wasn t going to be very helpful, the next step was to just get other things working, and get the data back some other way. What I did was to use a virtual machine to get the data off the remaining btrfs volume. The kernel could become unusable all it wanted to, I could just hard reboot without impacting the other services. In the virtual machine, things got interesting . I did try various things I ve seen on the linux-btrfs list, but nothing really did anything at all except spew some more parent transid messages. I should mention that the remaining btrfs volume was a RAID 0. To mount those, you d mount one of the constituting disks like this:
$ mount /dev/sdb /mnt
Except that it would complain that it can t find a valid whatever (I don t remember the exact term, and I threw the VM away already) so it wouldn t mount the volume. But when mounting the other constituting disk, it would just work. Well, that s kind of understandable, but what is not is that on the next boot (I had to reboot a lot, see below), it would error out on the disk that worked previously, and work on the disk that was failing before. So, here is how things went: Ain t that fun? The good thing is that in the end, despite the pain, I recovered all that needed to be recovered. I m in the process of recreating my build chroots from scratch, but that s not exactly difficult. It would just have taken a lot more time to recover them the same way, 50 files at a time. Side note: yes, I did try newer versions of btrfsck ; yes I did try newer kernels. No, nothing worked to make these btrfs volumes viable. No, I don t have an image of these completely fucked up volumes.

2 May 2011

Mike Hommey: Installing Iceweasel 5.0a2 on Debian GNU/Linux

Only amd64 and i386 packages are available. Note that there is another Iceweasel version available there: aurora . Currently, this is the same as 5.0 , but whenever Firefox 5.0 will reach the beta stage, aurora will be 6.0a2. Please feel free to use aurora if you want to keep using these pre-beta builds.

23 April 2011

Mike Hommey: Coming soon

14 March 2011

Mike Hommey: Avoiding dependencies upon recent libstdc++

Mozilla has been distributing Firefox builds for GNU/Linux systems for a while, and 4.0 should even bring official builds for x86-64 (finally, some would say). The buildbots configuration for these builds uses gcc 4.3.3 to compile the Firefox source code. With the C++ part of gcc, it can sometimes mean side effects when using the C++ STL. Historically, the Mozilla code base hasn t made a great use of the STL, most probably because 10+ years back, portability and/or compiler support wasn t very good. More recently, with the borrowing of code from the Chromium project, this changed. While the borrowed code for out-of-process plugins support didn t have an impact on libstdc++ usage, the recent addition of ANGLE had. This manifests itself in symbols version usage These are the symbol versions required from libstdc++.so.6 on 3.6 (as given by objdump -p): And on 4.0: This means Firefox 4.0 builds from Mozilla need the GLIBCXX_3.4.9 symbol version, which was introduced with gcc 4.2. This means Firefox 4.0 builds don t work on systems with a libstdc++ older than that, while 3.6 builds would. It so happens that the system libstdc++ on the buildbots themselves is that old, which is why we set LD_LIBRARY_PATH to the appropriate location during tests. This shouldn t however be a big problem for users. Newer gcc, new problems As part of making Firefox faster, we re planning to switch to gcc 4.5, to benefit from better (as in working) profile guided optimization, and other compiler improvements. We actually attempted to switch to gcc 4.5 twice during the 4.0 development cycle. But various problems made us go back to gcc 4.3.3, the main contender being the use of even newer libstdc++ symbols: GLIBCXX_3.4.14 was added in gcc 4.5, making the build require a very recent libstdc++ installed on users systems. As this wouldn t work for Mozilla builds, we attempted to build with -static-libstdc++. This options makes the resulting binary effectively contain libstdc++ itself, which means not requiring a system one. This is the usual solution used for builds such as Mozilla s, that require to work properly on very different systems. The downside of -static-libstdc++ is that it makes the libxul.so binary larger (about 1MB larger). It looks like the linker doesn t try to eliminate the code from libstdc++ that isn t actually used. Taras has been fighting to try to get libstdc++ in a shape that would allow the linker to remove that code that is effectively dead weight for Firefox. Why do we need these symbols? The actual number of symbols required with the GLIBCXX_3.4.14 version is actually very low: With the addition of the following on debug builds only: The number of symbols required with the GLIBCXX_3.4.9 version is even lower: It however varies depending on the compiler version. I have seen other builds also require std::ostream& std::ostream::_M_insert(double). All these are actually internal implementation details of the libstdc++. We re never calling these functions directly. I m going to show two small examples triggering some of these requirements (that actually generalize to all of them). The case of templates
#include <iostream>
int main()  
    unsigned int i;
    std::cin >> i;
    return i;
 
This example, when built, requires std::istream& std::istream::_M_extract<double>(double&), but we are effectively calling std::istream& operator>>(unsigned int&). It is defined in /usr/include/c++/4.5/istream as:
template<typename _CharT, typename _Traits>
class basic_istream : virtual public basic_ios<_CharT, _Traits>  
    basic_istream<_CharT, _Traits>& operator>>(unsigned int& __n)  
        return _M_extract(__n);
     
 
And _M_extract is defined in /usr/include/c++/4.5/bits/istream.tcc as:
template<typename _CharT, typename _Traits> template<typename _ValueT>
        basic_istream<_CharT, _Traits>&
        basic_istream<_CharT, _Traits>::_M_extract(_ValueT& __v)  
            (...)
         
And later on in the same file:
extern template istream& istream::_M_extract(unsigned int&);
What this all means is that libstdc++ actually provides an implementation of an instance of the template for the istream (a.k.a. basic_istream<char>) class, with an unsigned int & parameter (and some more implementations). So, when building the example program, gcc decides, instead of instantiating the template, to use the libstdc++ function. This extern definition, however, is guarded by a #if _GLIBCXX_EXTERN_TEMPLATE, so if we build with -D_GLIBCXX_EXTERN_TEMPLATE=0, we actually get gcc to instantiate the template, thus getting rid of the GLIBCXX_3.4.9 dependency. The downside is that this doesn t work so well with bigger code, because other things are hidden behind #if _GLIBCXX_EXTERN_TEMPLATE. There is however another (obvious) way to for the template instantiation: instantiating it. So adding template std::istream& std::istream::_M_extract(unsigned int&); to our code is just enough to get rid of the GLIBCXX_3.4.9 dependency. Other template cases obviously can be worked around the same way. The case of renamed implementations
#include <list>
int main()  
    std::list<int> l;
    l.push_back(42);
    return 0;
 
Here, we get a dependency on std::_List_node_base::_M_hook(std::_List_node_base*) but we are effectively calling std::list<int>::push_back(int &). It is defined in /usr/include/c++/bits/stl_list.h as:
template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class list : protected _List_base<_Tp, _Alloc>  
    void push_back(const value_type& __x)  
        this->_M_insert(end(), __x);
     
 
_M_insert is defined in the same file:
template<typename ... _Args>
void _M_insert(iterator __position, _Args&&... __args)  
    _List_node<_Tp>* __tmp = _M_create_node(std::forward<_args>(__args)...);
    __tmp->_M_hook(__position._M_node);
 
Finally, _M_hook is defined as follows:
struct _List_node_base  
    void _M_hook(_List_node_base * const __position) throw ();
 
In gcc 4.4, however, push_back has the same definition, and while _M_insert is defined similarly, it calls __tmp->hook instead of __tmp->_M_hook. Interestingly, gcc 4.5 s libstdc++ exports symbols for both std::_List_node_base::_M_hook and std::_List_node_base::hook, and the code for both methods is the same. Considering the above, a work-around for this kind of dependency is to define the newer function in our code, and make it call the old function. In our case here, this would look like:
namespace std  
    struct _List_node_base  
        void hook(_List_node_base * const __position) throw ();
        void _M_hook(_List_node_base * const __position) throw ();
     ;
    void _List_node_base::_M_hook(_List_node_base * const __position) throw ()  
        hook(__position);
     
 
which you need to put in a separate source file, not including <list>. All in all, with a small hack, we are able to build Firefox with gcc 4.5 without requiring libstdc++ 4.5. Now, another reason to switch to gcc 4.5 was to use better optimization flags, but it turns out it makes the binaries 6MB bigger. But that s another story.

Julien Viard de Galbert: Triage X Bugs of the Week (TXBW18)

Better later than never, I restarted to triage last week, here is the report for it. In the mean time, Cyril has been triaging a lot of bugs see Debian XSF News #7.
But there are still 533 bugs to triage. Mike Hommey recently wrote about graphs for the Debian Bug Tracking System. In particular there is now a per maintainer graph, so here it the X Strike Force bug graph ! (Note: the graph does not filter out some bugs as we do on UDD. The X Strike Force (still) needs you ! You can have a look at the X Strike Force Bug Closing Procedure and check XSF unstable bugs sorted by date.

6 March 2011

Mike Hommey: A good reason to keep patched source in $VCS

There are a lot of different workflows to maintain Debian packages under a Version Control System. Some people prefer to only keep the debian directory, some the whole source. And in the latter category, some prefer the source tree to be patched with Debian changes, while others prefer to keep it unpatched and exclusively use debian/patches. It turns out the former and the latter don t work so well in one specific case that any package may hit some day ; and that day, you realize how wrong you were not tracking the entire patched source. That happened to me recently, though instead of actually going forward and switch to tracking the patched source, I cheated and simply ditched the patch, because I didn t strictly need it. In all fairness, this is not only a case against not tracking patched source, but also a case of the 3.0 (quilt) source format being cumbersome. In my specific case, I cherry picked an upstream patch modifying and adding some test cases related to a cherry-picked fix. One of the added test cases was a UTF-16 file. UTF-16 files can t be diff ed nor patch ed except in the git patch format, but quilt doesn t use nor support that. The solution around this limitation of 3.0 (quilt) format is to include the plain modified file in the Debian tarball, and add its path to debian/source/include-binaries. On the VCS side of things, it means you have to modify the file in the source directory, and fill debian/source/include-binaries accordingly. Wait. Modify the file in the source directory ? But the other files aren t ! They re tracked by patches ! So here you are, with all of your modifications exclusively in debian/patches except one.

5 March 2011

Mike Hommey: More graphs for the Debian Bug Tracking System

I have been maintaining Debian Bug Tracking System graphs for a few years, now, though not very actively. They initially were available on people.debian.org/~glandium/bts/, but there have been some recent changes. A while ago, I started experimenting with brand new graphs on merkel.debian.org/~glandium/bts/, and when merkel was announced to be dying a few months ago, I got in touch with the QA team to know what to do with them, and it was decided we d put them on qa.debian.org. I unfortunately didn t follow up much on this and only recently actually worked on the migration, which took place 2 weeks ago. The result is that the graphs have officially moved to qa.debian.org/data/bts/graphs/, and links on the Package Tracking System have been updated accordingly. There is now also an additional graph tracking all open bugs in the BTS, across all packages: Today, I added a new feature, allowing to consolidate data for multiple arbitrary packages in a single graph. Such graphs can be generated with the following URL scheme (please don t over-abuse of it):
http://qa.debian.org/data/bts/graphs/multi/name0,name1,name2,etc.png
As an example, here is a graph for all the bugs on the packages I (co-)maintain:
http://qa.debian.org/data/bts/graphs/multi/dehydra,diggler,iceape,iceweasel,libxml2,libxslt,livehttpheaders,mozilla-dom-inspector,nspr,nss,pyxpcom,venkman,vmfs-tools,webkit,xulrunner,zfs-fuse.png
And here are the bugs affecting Mozilla-related packages:
http://qa.debian.org/data/bts/graphs/multi/iceape,icedove,iceowl,iceweasel,nspr,nss,xulrunner.png
I guess the next step is to allow per-maintainer consolidation through URLs such as
http://qa.debian.org/data/bts/graphs/by-maint/address.png
Update: per-maintainer consolidation has been added. (Hidden message here: please help triaging these bugs)

Next.

Previous.